
## Introduction
This is the anonymous source code for **ICLR26 submission #14484**.  
Our code is built upon the open-source project [rLLM](https://github.com/agentica-project/rllm). Before training, you need to copy the traing data into folder 'rllm/data/' Please do not not distribute

## Getting Started 🎯
### Installation
```bash
# Installing Python 3.10 Environment.
conda create -n rllm python=3.10 -y
conda activate rllm

# Installing RLLM dependencies.
cd rllm
pip install -e ./verl
pip install -e .
```

### Data
Our raw training data is in `rllm/data/[train|test]/[code|math]/`, along with preprocessing scripts in `rllm/data/preprocess`. To convert the raw data into Parquet files for training, run:

```bash
# Download datasets from GDrive, populates rllm/data/[train|test]/[math|code]/*.json
python scripts/data/download_datasets.py

# Generate parquet files for Deepcoder/DeepscaleR in data/*.parquet
python scripts/data/[deepcoder|deepscaler]_dataset.py
```

### Training Scripts

We provide training scripts for both DeepCoder and DeepScaleR models in the `scripts/e1-math/`.

```bash
sh ./scripts/e1-math/e1_math_1.5b_1k_1k.sh
```
